group 3
Unsupervised Machine-Learning Pipeline for Data-Driven Defect Detection and Characterisation: Application to Displacement Cascades
Del Fré, Samuel, de Backer, Andrée, Domain, Christophe, Thuinet, Ludovic, Becquart, Charlotte S.
Neutron irradiation produces, within a few picoseconds, displacement cascades that are sequences of atomic collisions generating point and extended defects which subsequently affects the long-term evolution of materials. The diversity of these defects, characterized morphologically and statistically, defines what is called the "primary damage". In this work, we present a fully unsupervised machine learning (ML) workflow that detects and classifies these defects directly from molecular dynamics data. Local environments are encoded by the Smooth Overlap of Atomic Positions (SOAP) vector, anomalous atoms are isolated with autoencoder neural networks (AE), embedded with Uniform Manifold Approximation and Projection (UMAP) and clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). Applied to 80 keV displacement cascades in Ni, Fe$_7$0Ni$_{10}$Cr$_{20}$, and Zr, the AE successfully identify the small fraction of outlier atoms that participate in defect formation. HDBSCAN then partitions the UMAP latent space of AE-flagged SOAP descriptors into well defined groups representing vacancy- and interstitial-dominated regions and, within each, separates small from large aggregates, assigning 99.7 % of outliers to compact physical motifs. A signed cluster-identification score confirms this separation, and cluster size scales with net defect counts (R2 > 0.89). Statistical cross analyses between the ML outlier map and several conventional detectors (centrosymmetry, dislocation extraction, etc.) reveal strong overlap and complementary coverage, all achieved without template or threshold tuning. This ML workflow thus provides an efficient tool for the quantitative mapping of structural anomalies in materials, particularly those arising from irradiation damage in displacement cascades.
Cyber Racing Coach: A Haptic Shared Control Framework for Teaching Advanced Driving Skills
Shen, Congkai, Yu, Siyuan, Weng, Yifan, Ma, Haoran, Li, Chen, Yasuda, Hiroshi, Dallas, James, Thompson, Michael, Subosits, John, Ersal, Tulga
Abstract--This study introduces a haptic shared control framework designed to teach human drivers advanced driving skills. In this context, shared control refers to a driving mode where the human driver collaborates with an autonomous driving system to control the steering of a vehicle simultaneously. Advanced driving skills are those necessary to safely push the vehicle to its handling limits in high-performance driving such as racing and emergency obstacle avoidance. Previous research has demonstrated the performance and safety benefits of shared control schemes using both subjective and objective evaluations. However, these schemes have not been assessed for their impact on skill acquisition on complex and demanding tasks. Prior research on long-term skill acquisition either applies haptic shared control to simple tasks or employs other feedback methods like visual and auditory aids. T o bridge this gap, this study creates a cyber racing coach framework based on the haptic shared control paradigm and evaluates its performance in helping human drivers acquire high-performance driving skills. The framework introduces (1) an autonomous driving system that is capable of cooperating with humans in a highly performant driving scenario; and (2) a haptic shared control mechanism along with a fading scheme to gradually reduce the steering assistance from autonomy based on the human driver's performance during training. Two benchmarks are considered: self-learning (no assistance) and full assistance during training. Results from a human subject study indicate that the proposed framework helps human drivers develop superior racing skills compared to the benchmarks, resulting in better performance and consistency. Advanced driving skills refer to a set of competencies that go beyond basic driving abilities in terms of situational awareness, hazard perception, risk management, and vehicle handling [1]. They are crucial in high-performance driving tasks such as racing, and can also improve safety in everyday driving [1], [2]. This work has been submitted to the IEEE for possible publication.
Cross-Platform DNA Methylation Classifier for the Eight Molecular Subtypes of Group 3 & 4 Medulloblastoma
Abid, Omer, Rafiee, Gholamreza
Omer Abid, Gholamreza Rafiee * Abstract -- Medulloblastoma is a malignant pediatric brain cancer, and the discovery of molecular subgroups is enabling personalized treatment strategies. In 2019, a consensus identified eight novel subtypes within Groups 3 and 4, each displaying heterogeneous chara cteristics. Classifiers are essential for translating these findings into clinical practice by supporting clinical trials, personalized therapy development and application, and patient monitoring. This study presents a DNA methylation - based, cross - platform machine learning classifier capable of distinguishing these subtypes on both HM450 and EPIC methylation array samples . Across two independent test sets, the model achieved weighted F1 = 0.95 and balanced accuracy = 0.957, consistent across platforms. As the first cross - platform solution, it provides backward compatibility while extending applicability to a newer platform, also enhancing accessibility. It also has the potential to become the first publicly available classifier for these subtypes once deployed through a web application, as planned in the future . Th is work overall takes steps in the direction of advancing precision medicine and improving clinical outcomes for patients within the majority prevalence medulloblastoma subgroups, g roups 3 and 4. Keywords -- Medulloblastoma, Molecular Subgroup Classification, Machine Learning, AI for Health Medulloblastoma is a malignant brain cancer widely known for its prevalence in children. Through extensive treatment strategies based on surgery, chemotherapy and radiation therapy, approximately 75% of the patient are able to survive in the long term [1]. These treatments whi le crucial also come along with negative side effects, effecting patients' li ves [1] [2], especially considering the implications on the growing children. However, with advancement in genomics, molecular subgroups have been discover ed within the disease . T hese subgroups have shown to be heterogenous in clinical, biological and outcomes perspective [3] . These in fact are now considered better definition of disease behaviour than conventional techniques [3] .
Predicting person-level injury severity using crash narratives: A balanced approach with roadway classification and natural language process techniques
Majidi, Mohammad Zana, Karimi, Sajjad, Wang, Teng, Kluger, Robert, Souleyrette, Reginald
Predicting injuries and fatalities in traffic crashes plays a critical role in enhancing road safety, improving emergency response, and guiding public health interventions. This study investigates the added value of unstructured crash narratives (written by police officers at the scene) when combined with structured crash data to predict injury severity. Two widely used Natural Language Processing (NLP) techniques, Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec, were employed to extract semantic meaning from the narratives, and their effectiveness was compared. To address the challenge of class imbalance, a K-Nearest Neighbors-based oversampling method was applied to the training data prior to modeling. The dataset consists of crash records from Kentucky spanning 2019 to 2023. To account for roadway heterogeneity, three road classification schemes were used: (1) eight detailed functional classes (e.g., Urban Two-Lane, Rural Interstate, Urban Multilane Divided), (2) four broader paired categories (e.g., Urban vs. Rural, Freeway vs. Non-Freeway), and (3) a unified dataset without classification. A total of 102 machine learning models were developed by combining structured features and narrative-based features using the two NLP techniques alongside three ensemble algorithms: XGBoost, Random Forest, and AdaBoost. Results demonstrate that models incorporating narrative data consistently outperform those relying solely on structured data. Among all combinations, TF-IDF coupled with XGBoost yielded the most accurate predictions in most subgroups. The findings highlight the power of integrating textual and structured crash information to enhance person-level injury prediction. This work offers a practical and adaptable framework for transportation safety professionals to improve crash severity modeling, guide policy decisions, and design more effective countermeasures.
VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets
Mukherjee, Biswarup, Zhou, Li, Krishnan, S. Gokul, Kabirifar, Milad, Lakshminarayana, Subhash, Konstantinou, Charalambos
This paper introduces a model for coordinating prosumers with heterogeneous distributed energy resources (DERs), participating in the local energy market (LEM) that interacts with the market-clearing entity. The proposed LEM scheme utilizes a data-driven, model-free reinforcement learning approach based on the multi-agent deep deterministic policy gradient (MADDPG) framework, enabling prosumers to make real-time decisions on whether to buy, sell, or refrain from any action while facilitating efficient coordination for optimal energy trading in a dynamic market. In addition, we investigate a price manipulation strategy using a variational auto encoder-generative adversarial network (VAE-GAN) model, which allows utilities to adjust price signals in a way that induces financial losses for the prosumers. Our results show that under adversarial pricing, heterogeneous prosumer groups, particularly those lacking generation capabilities, incur financial losses. The same outcome holds across LEMs of different sizes. As the market size increases, trading stabilizes and fairness improves through emergent cooperation among agents.
Facility Location with Public Locations and Private Doubly-Peaked Costs
In the facility location problem, the task is to place one or more facilities so as to minimize the sum of the agent costs for accessing their nearest facility. Heretofore, in the strategic version, agent locations have been assumed to be private, while their cost measures have been public and identical. For the most part, the cost measure has been the distance to the nearest facility. However, in multiple natural settings, such as placing a firehouse or a school, this modeling does not appear to be a good fit. For it seems natural that the agent locations would be known, but their costs might be private information. In addition, for these types of settings, agents may well want the nearest facility to be at the right distance: near, but not too near. This is captured by the doubly-peaked cost introduced by Filos-Ratsikas et al. (AAMAS 2017). In this paper, we re-examine the facility location problem from this perspective: known agent locations and private preferred distances to the nearest facility. We then give lower and upper bounds on achievable approximations, focusing on the problem in 1D, and in 2D with an $L_1$ distance measure.
Using customized GPT to develop prompting proficiency in architectural AI-generated images
Rodriguez, Juan David Salazar, Joyce, Sam Conrad, Julfendi, null
This research investigates the use of customized GPT models to enhance prompting proficiency among architecture students when generating AI-driven images. Prompt engineering is increasingly essential in architectural education due to the widespread adoption of generative AI tools. This study utilized a mixed-methods experimental design involving architecture students divided into three distinct groups: a control group receiving no structured support, a second group provided with structured prompting guides, and a third group supported by both structured guides and interactive AI personas. Students engaged in reverse engineering tasks, first guessing provided image prompts and then generating their own prompts, aiming to boost critical thinking and prompting skills. Variables examined included time spent prompting, word count, prompt similarity, and concreteness. Quantitative analysis involved correlation assessments between these variables and a one-way ANOVA to evaluate differences across groups. While several correlations showed meaningful relationships, not all were statistically significant. ANOVA results indicated statistically significant improvements in word count, similarity, and concreteness, especially in the group supported by AI personas and structured prompting guides. Qualitative feedback complemented these findings, revealing enhanced confidence and critical thinking skills in students. These results suggest tailored GPT interactions substantially improve students' ability to communicate architectural concepts clearly and effectively.
Concept Space Alignment in Multilingual LLMs
Multilingual large language models (LLMs) seem to generalize somewhat across languages. We hypothesize this is a result of implicit vector space alignment. Evaluating such alignment, we see that larger models exhibit very high-quality linear alignments between corresponding concepts in different languages. Our experiments show that multilingual LLMs suffer from two familiar weaknesses: generalization works best for languages with similar typology, and for abstract concepts. For some models, e.g., the Llama-2 family of models, prompt-based embeddings align better than word embeddings, but the projections are less linear -- an observation that holds across almost all model families, indicating that some of the implicitly learned alignments are broken somewhat by prompt-based methods.